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Abstract: In this article, we investigate the properties of the EBIC in vari- 
f-H ' able selection for generalized linear models with non-canonical links and di- 

verging number of parameters in ultra-high dimensional feature space. The 
. selection consistency of the EBIC in this situation is established under mod- 

, erate conditions. The finite sample performance of the EBIC coupled with 

a forward selection procedure is demonstrated through simulation studies 
and a real data analysis. 
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^ : 1 Introduction 

Variable selection is a primary concern in many important contemporary scientific fields 
such as signal processing, medical research and genetic studies etc.. In these fields, 
usually, a relatively small set of relevant variables need to be selected from a huge col- 
lection of available variables. For example, in genetic genome- wide association studies 
' (GWAS), to identify loci or genes that affect a quantitative trait or a disease status, 

thousands of thousands, even minions, of single nucleotide polymorphisms (SNP) are 
under consideration. The number of variables is much larger than the sample size in 
such studies. This phenomenon is referred to as small-n-large-p. Variable selection in 
small-n-large-p problems poses a great challenge. 

A major approach for variable selection is model based; that is, a model is formu- 
lated to describe the relationship between a response variable (e.g., the measurement 
of a quantitative trait) and a set of predictor variables or covariates (e.g., the geno- 
types of SNPs), and the covariates are selected by a certain variable selection criterion. 
A variable selection criterion is crucial in model based variable selection. Traditional 
variable selection criteria such as Akaike's Information Criterion (AIC) (Akaike, 1973), 
Bayes Information Criterion (BIC) (Schwarz (1978)) and Cross Validation (CV) (Stone 
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(1974)) are no longer appropriate for variable selection in small- n-large-p problems. 
These traditional criteria tend to select too many irrelevant covariates because they 
are generally not selection consistent. Recently, some BIC-type criteria have been pro- 
posed for small-n-large-p problems. Bogdan et al. (2004) considered a criterion called 
modified BIC (mBIC) for QTL mapping models. Wang et al. (2009) studied another 
modified BIC for models with diverging number of parameters. Chen and Chen (2008) 
extended the original BIC to a family called extended BIC (EBIC) governed by a 
parameter 7. 

The criterion considered by Wang et al. (2009) modifies the original BIC by multi- 
plying the second term of BIC with a diverging parameter and is somehow ad hoc. To 
achieve selection consistency, it requires p/n^ < 1 for some < ^ < 1, and hence is not 
applicable when p > n. The mBIC and EBIC considered by Bogdan et al. (2004) and 
Chen and Chen (2008) respectively are developed from a Bayesian framework. For the 
mBIC, a binomial prior on the number of covariates is imposed on each model. For 
EBIC, the prior on a model is proportional to a power of the size of model class which 
the model belongs. Asymptotically, mBIC is a special case of EBIC corresponding 
to 7 = 1. The selection consistency of EBIC for linear models with fixed number of 
parameters is established in Chen and Chen (2008). The result is then extended to 
generalized linear models (GLIM) with canonical links in Chen and Chen (2012). The 
EBIC has been used for choosing tuning parameters in penalized likelihood approaches, 
see Huang et al. (2010), for feature selection procedures, see Wang (2009) and Luo and 
Chen (2011), and for QTL mapping and disease gene mapping studies, see Li and Chen 
(2009) and Zhao and Chen (2012). 

In GLIMs, canonical links do not always provide the best fit. Generally, there is no 
reason apriori why a canonical link should be used, and in many cases a non-canonical 
link is more preferable, see McCuUagh and Nelder (1989) and Czado and Munk (2000). 
In many conventional scientific fields such as those mentioned at the beginning of this 
article, it becomes a norm that the number of covariates under consideration is so 
large that it can be considered as having an exponential order of the sample size. This 
is referred to as the case of ultra-high dimensional feature space. In problems such 
as QTL and disease gene mapping, a quantitative trait or disease status is usually 
affected by many loci. Except a few so-called major genes, most of the loci have only 
a small effect which cannot be detected when the sample size is small. As the sample 
size increases, so does the number of detectable such effects. This phenomenon is 
mathematically well modeled by diverging number of parameters, i.e., the number of 
truly relevant covariates diverges as the sample size increases. Therefore the GLIMs 
with non-canonical links and diverging number of parameters in the case of ultra- 
high dimensional feature space become appealing. In this article, we investigate the 
properties of EBIC for such models and establish its selection consistency. The selection 
consistency of EBIC for GLIMs with canonical links does not trivially pass to the case 
of non-canonical links. The selection consistency in the case of non-canonical links is 
established under more general conditions than those in Chen and Chen (2012). The 
conditions, though general, are naturally satisfied by many popular examples as given 
in Wedderburn (1976). We also present a forward selection procedure with EBIC for 
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the GLIMs. This procedure is appHed in simulation studies and a real data analysis to 
evaluate its validity. 

The remainder of this article is organized as follows. In section 2, the main results 
are presented and discussed. In section 3, simulation studies are reported and analyzed. 
In section 4, the forward selection procedure with EBIC is applied to analyze a well 
known Leukemia data set published in Golub et al (1999). All the technical proofs are 
provided in the Appendix. 

2 Selection Consistency of EBIC for GLIM with non- 
canonical links 

Let (yi,Xi),i = 1, . . . ,n, be the observations where yi is a response variable and Xi = 
{xii, . . . , Xip^Y is a p„-vector of covariates. We consider the generalized linear model 
(GLIM) below: 

Vi ~ fiVi', Oi) = exp{9iyi - b{9i)} w.r.t. u, i = l,...,n, 

where is a a-finite measure. Prom the properties of exponential family, we have 

/x(0i) = E{yi) = b'iOi), a\ei) = Var(yi) = b"{di), 

where b' and b" are the first and the second derivatives of b respectively. The 9i is 
related to £Cj through the relationship: 

gi^Oi)) = Vi = xlf3, 

where g is a monotone function called link function and f3 is p„-dimcnsional parameter 
vector. If g{li{6i)) = 6i, i.e., g = the link is called the canonical link. In this article, 
we consider general link functions including the canonical link. Because of the one-to- 
one correspondence between Oi and r/j, there is a function h such that 9i = h{r)i) = 
h{xjf3). If g is twice differentiable, so is h. Thus the probability density function of yi 
can be expressed as 

f{yf, h{xJP)) = eMViK^l^) - m^JP))}- 

In the above GLIM, we assume that p„ = 0{exp{n'^}), < k < 1, and that there 
are only a relatively small number of components of /3 are nonzero. Throughout the 
article, the following notation and convention are used. Denote by s any subset of 
the index set S = {1,2, and \s\ its cardinality. For convenience, s is used 

exchangeably to denote both an index set and the set of covariates with indices in the 
index set, and is also referred to as a model, i.e., the GLIM consisting of the covariates 
in s. Let son = {j '■ Pj 7^ 0, j = 1, . . . ,Pn} and pon = \son\- The covariates belonging to 
SQn are called relevant features and the others irrelevant features, son is also referred 
to as the true model. Let X = (xl, . . . , xl^y. Denote by X{s) the sub matrix formed 
by the columns of X whose indices falling into s. Let xf be the vector consisting of the 
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components of Xi whose indices belonging to s, and let f3^ be the corresponding sub 
vector of /3. Let Sj denote the set of (^j) combinations of j indices from S. Denote 

The EBIC of a model s, as defined in Chen and Chen (2008), is 



EBIC^(s) = -21nL„, \^l3' j + |s| Inn + 27lnT(S'|,|), 7 > 0, 

where Ln{(3^) is the maximum likelihood of model s and is the maximum likelihood 
estimate (MLE) of 

Denote by IniP^), SnifS'^) and Hn{P'^) the log likelihood function, the score vector 
and the Hessian matrix of the model s respectively. Suppose the link function g is twice 
differentiable, we have 



i=l 



n 

=Y,{^\h{xr(3^))[}{{xr(3^)f - [y, - i^{h{xr(3^))]}^'{xri3^)}xtxr 

i=l 

=Hni{pn - HnoiPn, say. 

When son C s, we simply denote /ij = }) {h{xf (3^)) and af = ll' {h{xf (3'^)). The 
major difference between the case of canonical links and the case of non-canonical 
links is as follows. If g is the canonical link, h = 1 and h =0, hence Hno = 
and Hn{P'^) is positive definite when X{s) is of full column rank. Therefore, /n(/3*) 
is a strictly concave function of /3'*. But, if is a non-canonical link, Hn{f3^) is not 
necessarily positive definite. As a consequence, ln{P'^) is not necessarily concave, and 
the maximum likelihood estimate of 13" does not necessarily exist. We will show that 
HnoiP") is asymptotically negligible (Lemma [T|) for in a neighborhood of the true 
parameter value of the GLIM. Thus Hn{(3") is asymptotically locally positive definite. 
To guarantee the existence of the MLE of /3'^ for finite samples, we assume that the 
link function g is chosen such that ln{P'^) has a unique maximum. We now state the 
conditions required for the selection consistency of the EBIC. 

CI ln{pn) = O{n'^),pon = 0{n^) where 6 > 0, k > and 6 k < 1/3; 
C2 miujgsg^ > Cn~^/^ for some constant C > 0; 

C3 For any s, the interior of B{s) = {(3 : J eyLp{h{xf j3)y)di' < 00, i = 1, 2, . . . , n} is 
not empty. Let (3q denote the true parameter of the GLIM. If |s| < fcpon; where 
A; > 1, then (3q is in the interior of B{s). 
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C4 There exist positive ci and C2 such that for all sufficiently large n, 

cm < XrniniHnliPt'""")) < Amax(^nl )) < Can 

for all s with |s| < /cpom where Amin and Amax denote respectively the smallest 
and largest eigenvalues; 

C5 For any given ^ > 0, there exists a 5 > such that when n is sufficiently large, 

(1 - OHnM""''") < H^jiP'"'"-) < (1 + OHnM'"'-),j = 0, 1, 

whenever ||/3^uson _ /J^^^^ony < ^ f^j. ^ ^^^^^ |g| < f.^^^. 

C6 The quantities \h {xI(3q)\, \h {xl(3Q)\,i = 1, . . . , n; j = 1, . . . ,p„ are bounded 
from above, and (7f,i = 1, . . . ,n are bounded both from above and below away 
from zero. Furthermore, 

Conditions C2 and C3 are the same as conditions A2 and A3 in Chen and Chen 
(2012). Conditions C4 - C5 reduce to conditions A4-A5 in Chen and Chen (2012) for 
canonical links. When A6 in Chen and Chen (2012) is satisfied, C6 is satisfied by com- 
monly used GLIMs such as Poisson distribution with log and power function links. Bi- 
nary distribution with identity, arcsin, complementary log-log and probit links. Gamma 
distribution with log and inverse power function links. These GLIMs are throughly 
studied in Wedderburn (1976). The verification of C6 for these GLIMs is given in a 
complementary document at website: http://www.stat.nus.edu.sg/~stachenz/. 

We now state our main results as follows. Define Aq = {s : son C s,son 7^ s, |s| < 
kpon} and Ai = {s : son ?! s, |s| < kpon}- We have 

Theorem 1. Under assumptions C1-C6, as n ^ +oo, 

(1) P(min,e^i EBIC^{s) < EBIC^{son)) ^ 0, for any 7 > 0; 

1 / log n 



(2) P{mmseAo EBIC^{s) < EBIC^{son)) ^ 0, for any j > 

1 — e \ 2 log Pr. 

where e is an arbitrarily small positive constant. 
The following result are needed in the proof of Theorem [TJ 
Lemma 1. Under conditions CI - C6, whenever ||/3'^uson _ /3*u*on||2 < 

U^Hn {P{S U son)) U = U^H^ {f3{s U Sq™)) U {1 + Op(l)) , 

uniformly in s with \s\ < kpon- 
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The above lemma imply the following result that gives the convergence rate of the 
L2-consistency of the MLE of (3'"^ when son C s. The result is of its own interest. 

Theorem 2. Under conditions CI - C6, as n ^ oo, 

0'-Poh = uniformly 

for s e Aq. 

The technical details of the proof for the above results are given in the Appendix. 
Theorem 1 implies that if we confine to the models with cardinality less than or equal 
to kpon and select the model with the smallest EBIC among all those models then, 
with probability converging to 1, the selected model, say, s*, will be the same as the 
true model son- This property is what is called selection consistency. The constraint 
that |s| < kpon is natural since we do not need to consider any models with cardinality 
much larger than that of the true model in practical problems. However, in practice, the 
evaluation of all models with cardinality up to fcpon is computationally impossible. Like 
any other model selection criteria, the EBIC is to be used in a certain model selection 
procedure. In addition to the traditional forward selection procedures, a variety of 
procedures based on penalized likelihood approach have been developed within the last 
twenty years such as the LASSO (Tibishirani, 1996), SCAD (Fan and Li, 2001), Elastic 
Net (Zou and Hastie, 2005), and so on. A model selection criterion can be used in these 
procedures to choose the penalty parameter, which corresponds to choosing a model. 
However, though some desirable properties sTich as the so-called oracle property have 
been established for these penalized likelihood approaches under certain conditions, 
the asymptotic properties of these approaches with GLIM and ultra-high dimensional 
feature space have not been throughly studied yet to our knowledge. The traditional 
forward selection methods have been criticized for its greedy nature. But, recently, 
it is discovered that the greedy nature might not be bad especially when the model 
selection is for the selection of relevant variables rather than for a prediction model, 
see, e.g., Tropp (2004), Tropp and Gilbert (2007) and Wang (2009). In this article, we 
consider the application of the EBIC with the traditional forward regression procedure 
for GLIM in our simulation studies and real data analysis. 



3 Simulation Study 

In our simulation studies, we consider a GLIM with binary response and the comple- 
mentary log- log link. We take the divergent pattern {n,pn,Pon) = {n, [40e" ' ], [5n'''"^]) 
for n = 100, 200, 500. The settings for the covariates, which are adapted from Fan and 
Song (2010), are described below. 



Setting 1. Letg = 15, si = {l,...,g}, S2 = {g + l,...,[^]},S3 = {[^] +!,•••, 



2pn 

3 



and S4 



2pn 

3 



-I- 1, . . . Let the covariate vector x be decomposed into 

X = {x^^, x^^, x^^,x^^). Assume that x'"^^ follows N{0,T,p), where Ep has diago- 
nal elements 1 and off-diagonal elements p, x^^ follows N(0,I), the components 
of x^^ are i.i.d. as a double exponential distribution with location and scale 1, 
the components of x^'^ are i.i.d. with the normal mixture ^[N{—1, 1) -|- Ar(l, 0.5)]. 
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The covariates x^'' ,i = 1, . . . , n, arc generated as i.i.d. copies of k = 1, 2, 3, 4. 
Four values of p: 0, 0.3, 0.5 and 0.7, are considered, son = {L x t,t = 1, . . . ,pon}, 
where L = 10. Pj = 1, if j = L x t with odd t, 1.3, if j = L x t with even t, 0, 
otherwise. 

Setting 2. The same as setting 1 except L = 5. The essential difference between 

setting 1 and this setting is that, in setting 1, all the relevant features are inde- 
pendent while, in this setting, three of them have pairwise correlation p. Two 
values of p: 0.3 and 0.5, are considered in this setting. 

Setting 3. L = 10,q = 50. In all the settings for {n,Pn,Pon), this q is much smaller 
than pn and Pn — q is much bigger than Lpon- The distribution of the covariate 
vector X is specified as follows. For j = 1, . . . ,pn — q, the components 
i.i.d. standard normal variables. For Pn — Q < j Pn, 



POn 



.t=l 



where the ^j's are i.i.d. standard normal variables, generated as i.i.d. 

copies of X. The specification for son and (3 is the same as in setting 1. In this 

setting, all the relevant features are independent, the last q irrelevant features, 
which are highly pairwise correlated, have a weak marginal correlation with each 
of the relevant features but a strong overall correlation with the totality of the 
relevant features. 

We apply the forward selection procedure with EBIC in the simulation studies. In 
more detail, the procedure starts by fitting the GLIMs with one covariate, the covariate 
corresponding the model with the smallest EBIC is the first selected variable. Then 
GLIMs with two covariates including the first selected variable are considered, the 
additional covariate corresponding to the two-covariate model with the smallest EBIC 
is the second selected variable. The procedure continues this way and at each step, 
one more covariate is selected. To reduce the amount of computation, when pn is 
bigger than 1000, the sure independence screening procedure based on the maximum 
marginal estimator (MME) (Fan and Song (2010)) is used to reduce the dimension of 
the feature to 400 before the forward selection procedure is invoked. We consider four 
7 values in EBIC, i.e., 71 = 0, 72 = i(l - 21^^;:), 73 = 1 - and 74 = 1. Wc choose 

these values because 71 corresponds to the original BIG, 74 corresponds to mBIC, 72 
is halfway between and 1 — , the lower bound of the consistent range of 7, and 

73 is halfway between 1 — fl^^ and 1. Thus we can evaluate the asymptotic behavior 
of EBIC when the 7 value is below and above the lower bound of the consistent range 
and also make a comparison with BIG and mBIC. The performance of the procedure is 
evaluated by positive discovery rate (PDR) and false discovery rate (FDR). The PDR 
and FDR are defined as follows: 

PDR„ = ^- f^, FDR„ = ^-\f^, 
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where s* is the set of selected features. The selection consistency is equivalent to 

lim PDR„ = 1 and lim FDR„ = 0, 

in probability. The PDR and FDR are averaged over 200 replications. The results 
under Settings 1-3 are reported in Tables 1- 3 respectively. 

By examining Tables 1 - 3, we can find the following common trends: 1) with all the 
four 7 values, the PDR increases as n gets larger, 2) with 71 and 72 (which are below 
the lower bound of the consistent range), the FDR does not show a trend to decrease 
while, with 73 and 74 (which are within the consistent range), the FDR reduces rapidly 
towards zero, 3) though the PDRs with 73 and 74 are lower than those with 71 and 72 
when sample size is small, but they become comparable as the sample size increases, 
and 4) the FDR with 74 is lower than that with 73 when sample size is small, however, 
the PDR is also lower, as sample size gets larger, both the PDR and FDR with 73 
and those with 74 become comparable. These findings demonstrate that the selection 
consistency of EBIC is well realized in finite sample case. 

4 Real Data Analysis 

In this section, we apply the forward selection procedure with EBIC to analyze a 
Leukemia data set. The data consists of the expression levels of 7129 genes obtained 
from 47 patients with acute lymphoblastic leukemia (ALL) and 25 with acute myeloid 
leukemia (AML). The data set is available in the R packages Biobase and golubEsets. 
The initial version of this data set is described and analyzed by a method called "neigh- 
borhood analysis" in Golub et al. (1999). The data set is later analyzed using GLIM 
with probit link in Lee et al. (2003) and using GLIM with logit link in Liao and Chin 
(2007). 50 genes are identified as important ones affecting the types of leukemia in 
Golub et al. (1999), 27 genes are identified in Lee et al. (2003), and 19 genes are 
identified in Liao and Chin (2007). There are only a few overlapped genes among the 
three identified sets. 

We analyzed the data by the forward selection procedure with four different link 
functions: logit, probit, cauchit and cloglog. First, with each link function, the procedure 
was carried out until 50 genes were selected. The identified genes are reported in 
Table 15.41 These 50 genes are compared with three identified sets mentioned above. 
Those which were identified in Golub et al. (1999), Lee et al. (2003) and Liao and Chin 
(2007) are indicated by A and * respectively. There are three genes: 1834,1882, 6855, 
which are in all the three identified sets are selected by the forward selection procedure. 
They are all among the selected genes with logit and cloglog links. Two of them, i.e., 
1834, 1882, are only among the selected genes with probit and cauchit links. The other 
selected genes except two of them are in only one of the identified sets. Note that 
the selected genes and their ordering are different among the four different links. This 
indicates that the link function does matter in the selection procedure. Second, we 
used 8- fold cross validation to select the optimal link function among the four links. 
The optimal link is the logit link. Finally, we made a final selection using EBIC with 



8 



7 = l~ 3inp ) '^hich is slightly bigger than the lower bound of the consistent range. The 
final selected variables together with the maximum log likelihood of the corresponding 
model are reported in Table ESI To compare the final selection of the logit link with the 
other links, the selected results with all the four links are reported. The genes selected 
by the logit link are 1834 and 4438. The maximum log likelihood of the selected model 
with the logit link is the largest among all the four links. Note that, the same two genes 
are also selected by probit link and the gene 4438 is selected by cloglog link. We thus 
can conclude quite confidently that the two genes selected by logit link are the most 
important genes for studying the etiology of leukemia. 
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5 Appendix: Technical Proofs 

Proof of LemmaUl For any arbitrary s £ Ai, consider s = s U Son- Let a„i in Lemma 



n _ 2 



1 of Chen and Chen (2012) be h (a;f /3^) sign(yi - l^i)/^Yl ^1 [h" ixfPo)) > since 
£c|'^/3q = xJPq, from Condition C6, we have 

P (j2 - '"^)^" (^^/^o) I > C^'/'j < 2exp(-CnV3). (i) 
For any unit vector u with length 

n 

i=l 
n 

<^\iy^-^i^)h" {xlPo)\Mf2 (2) 
i=l 

n 

<C{k + l)pon - l^i)h «/3o) I- 

i=l 

The last inequality is true because all x[ are bounded, as assumed in Condition C6. 
([1]) and ([2]) with Condition C5 imply that, for any ^ > 0, there exists a (5 > such that 

P[ max «"i/„o (r'^'"") w > Cponn^/M 

VsG^l,||W||2 = l,||/3^''^«"-/3o''^0"||2<5 / 

<P ( max w-F„o ^ > T^POnn^/') 

Vseyii,||w||2=i 1 + C / 

n 

<\Al\P{Y, \{y^ - ^^^)h" {Xlf3,) 1 > Cn2/3) 
i=l 

c 

<2exp(A;pon Inpn - :r~rT"'^''^) = ^(l)- 
1 + ? 

Similar strategy applies to s G since Aq = {sLi son : s G ^i,0 < \s\ < {k — l)pon}- 
That is, max u'^ Hno ((3'''^''°") u = aJponu'^^^). Combined 

with pon = o(n^/'^) in Condition CI and Condition C4, we can have the desired result. 

□ 

Proof of TheoremUl According to the definition of EBIC, for any model s, EBIC^(s) < 
EBIC^(son) if and only if 

lnL„(;3') -lnL„(;3'"") > (|s| - pon) lnn/2 + 7 (lnr(5|,|) - Inrl^p^J) . (3) 
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To prove the selection consistency of EBIC, or mathematically, 



P min EBIC^(s) < EBIC^(son) ^ as n ^ +00, 

\s:|s|<A:pon,S7^so„ / 

it suffices to show that inequality ([3]) holds with a probability converging to as the 
sample size goes to infinity uniformly for all s & AqU Ai. This is completed by dealing 
with s G Aq and Ai separately. 

(I) Case 1: s ^ Ai. In this case, inequality ([3]) implies that 

lnL„ (^') -lnL„ (^'"") > -pon(ln n/2 + 7 lnp„). (4) 
Therefore, if we can show 

P ^sup lnL„ {ja'^^ - \nLn (^P^""^ > -pon(lnn/2 + ^lnpn)^ as n +00, (5) 
then we will have 

P ( min EBICt,(s) < EBIC^-fson) ) ^ as n ^ +00. 



The key becomes to derive the order for supgg_4j lnL„ j — In Ln yP j. For 

any s G .Ai, let s = s U son and P he P augmented with zeros corresponding to the 
elements in s\s. It can be seen that 

lnL„ (/3g) = lnL„ (/3gO") < lnL„ (^S'"") , lnL„ (^') = lnL„ (^p' 

which leads to 

sup In Ln (p') - In L„ (p"°") < sup In Ln (p') - In L„ (/3g) . (6) 

And also 

\\P' -Plh > lir""^'l|2 > min{|/3,|} > Cn-'/\ 
The positive definiteness of Hn{P), or the concavity of lnL„(/3'^) in P^ implies 



sup lnL„ [p -lnL„ (/3^) 

<sup{lnL„ {p')-\uLn [Pl) : W - Plh > n'^/\s e Ai} 

<sup{lnL„ {p') - In Ln {PI) : \\P' - P'oh = n-^'\s G Ai}. 

To derive the order of the right hand side in the above inequality, we take the Taylor 
Expansion of lnL„ (/3*) — lnL„ [Pq) as follows: 

lnL„ (r) -lnL„ {pl) 

= iP' - Piy Sn iP'o) -\{P'- PlYHnl iP"^') iP' - Pl) (8) 

+i(r-/3g)'i^no iP"') iP'-Pl) 
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where /3*^ is between /3'* and /3o- By condition C4 and C5, 

Lemma [1] implies that, for any such that ||/3'^ — /3'*""||2 = uniformly, there 

exists < c < ci such that, with probability tending to 1 as n goes to +00, 

lnK{p')-lnLn{Pl)<n-'/%M)U^-^n'/'a-^)- (9) 

Now we need to find out the uniform rate for the components in the score function 
Sn{Po)- We claim that under C1-C6, 

P ( max .2 . (^^) > Cn^/^\ = o{l). (10) 

This claim can be seen from Lemma 1 in Chen and Chen (2012). For a fixed j, let 

' I 2 

'^i^ij (XiPo)) ■ From Condition C6, we have 



\ i=i 



P [snj (/3o) > Cn2/3j =p \^^ani{yi - fii) > Cn^l^'l 

<P [^an^iy^ - /ij > Cn^A < exp(-Cni/3) 



The first inequality holds because of the boundedness of Xij and h' . Consequently, 
when lnp„ = o{n^^^), which is satisfied by CI, we have 

Pn 

J2P{^nj ifSo) > Cn2/3) = exp(lnp„ - Cn'/') = o(l). 

This completes the proof of the claim (jlOp . 

Therefore, the right hand side of ([9]) is less than cin^/^^ — C2n^^'^, which is less 
than — for some constant C > 0. Combined with inequalities ([6]) and ([7]) , this 
leads to 

sup lnL„ f^') -lnL„ (p'"") < -Cn^/\ 



Since under CI, ponlnn = o{n^^^),ponlii.pn = o{n^^^), we proved inequality ([5]). 

(II) Case 2: s G ^o- Let m = \s\ — ^^(son). Lemma 1 in Luo and Chen (2011) implies 
that, asymptotically, as n — >■ +00, EBIC^(s) < EBIC^(son) if and only if 

lnL„f^') -lnL„(^'°") > m[0.51nn + 7lnp„]. (11) 
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Therefore, it suffices to show 

P ^ sup In Ln (^P^^ - In L„ ) > m[0.5 In n + 7 lnp„] ) ^> as n -> 00 (12) 

to obtain 

P ( min EBIC^(s) < EBIC^(so„) ) ^ as n ^ +00. 

\s:sGAo 

Note that Lemma [U imphes 



lnL„ I^P j - lnL„ (^/3 ""j < lnL„ j - lnL„ (/3^)) 
=0^ - /3g)-.„(/3g) -^0'- P'orHr,0')0' - 131) (13) 

where ^ is any arbitrarily small positive constant. The applicability of the conclusion 
in C5 to simplify the right hand side of this inequality requires sup5g_4Q ||/3 — /3ol|2 be 
approaching as n goes to infinity. We claim that under conditions C1-C6, uniformly 
for s ^ Aq, we have 

\\p'-m\2 = 0,{n~y^). (14) 

We will show this claim in the following. For any unit vector u, let = /3q + n'^^^^u. 
Denote 

T=\ max u^Hno {(3') u < Cponn^/H , 
then Lemma [1] implies 



P (In Ln iP"") - In Ln (/3o) > : for some u,s e Aq) 
<P(lnL„(/3') -lnL„(/3g) > : for some u, s e Ao\T) + o{l). 

On T, When n is large enough, for all s G Aq, uniformly, we have 

lnL„ iP') - lnL„ (/3g) =n-^/^u^Sn (/3g) - ^nV3„- (n-^Hni [p")) u 



(15) 



In-'/' (u^Hno {P') u) 



=n-'/'u-Sn iP'o) - ci(l - 0^1/3/2 + 0(Pi 
<n'^/''u''sn iPo) - cn 



On) 

1/3 



Hence, for some positive constant c, we have 

P (In Ln iP") - In Ln (Pq) > : for some u) 
<P (u'^Sn iPo) > cr?/"^ ■ for some u 
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From (jlOp . we know that E E ^ {sn,j (/3g) > cn^/s) = o(l). The same for the second 
term. Therefore, 

P{lnLnil3') -InLniPo) > : for some s G A) = o(l)- (16) 

Because lnL„ (/3'^) is a concave function for any /3'', the maximum hkeUhood estimator 
P exists and falls within a neighborhood of /3g uniformly for s G A. Thus, we 

have P (^11/3' - /3g||2 = 0(n-i/3)^ ^ i. 

Now we can apply C5, the right hand side of (I13p can be upper bounded by 

where e is an arbitrarily small positive value. Hence, the left hand side of ()12p is no 
more than 

^ {^i^^[^fnm{Hni{m~^Snm > "^[0-5 In n + 7 Inpn]) 
<|A|exp(-m(l - e)[0.51nn + 7lnp„]) (17) 

< exp ( m[(ln(p„ - pon) - (1 - e)jlnpn - ^—Tr^ Inn] 



1 Inn 
It converges to when 7 > [1 



1-e 21np„ 

□ 
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Table 5.1: The PDR and FDR of the forward selection procedure with EBIC under 
simulation setting 1 (the PDR and FDR are averaged over 200 replicates, the numbers 
in parenthesis are standard errors) 





71 


72 


73 


74 


p 


Tl 


pr»R 

r uix 


_r UIX 


PDR 


r UIX 


r UIX 


T?DR 
_r UIX 


PDR 
r UIX 


T7DR 
r UIX 


n 


mo 


7Sfi 


375 


735 


362 


646 


1 93 

VJ. -L t/O 


481 

VJ .'±0 -L 


074 

VJ.VJ 1 '± 






(n 081 ^ 

[U.ZoL ) 


yU.zyz ) 




yU.ZvL ) 






[U.^Oo ) 


((\ 1 /1 1 ^ 






QSO 


979 


91 8 


0.223 


879 

VJ .O 1 tj 


1 97 


869 

VJ .OVJ^ 


078 

VJ . VJ ( o 






(U.ZZU ) 


yU.ZOZ ) 


[KJ.Z,06 ) 


tn 91 K\ 


[U.OIl ) 


^U.14( j 


[U.OO t j 


^U.iUo j 




500 


0.971 


0.408 


0.963 


0.371 


0.939 


0.079 


0.936 


0.026 






(0.135) 


(0.181) 


(0.163) 


(0.152) 


(0.231) 


(0.119) 


(0.238) 


(0.062) 


0.3 


100 


0.708 


0.407 


0.708 


0.398 


0.621 


0.196 


0.471 


0.081 






(0.298) 


(0.296) 


(0.298) 


(0.306) 


(0.384) 


(0.230) 


(0.442) 


(0.152) 




200 


0.933 


0.281 


0.924 


0.239 


0.889 


0.143 


0.855 


0.083 






(0.202) 


(0.248) 


(0.232) 


(0.212) 


(0.303) 


(0.161) 


(0.344) 


(0.111) 




500 


0.969 


0.428 


0.959 


0.354 


0.938 


0.047 


0.933 


0.014 






(0.130) 


(0.169) 


(0.177) 


(0.138) 


(0.238) 


(0.091) 


(0.247) 


(0.048) 


0.5 


100 


0.712 


0.401 


0.711 


0.383 


0.632 


0.201 


0.451 


0.080 






(0.293) 


(0.295) 


(0.294) 


(0.292) 


(0.385) 


(0.223) 


(0.447) 


(0.146) 




200 


0.929 


0.281 


0.923 


0.243 


0.881 


0.128 


0.858 


0.084 






(0.219) 


(0.257) 


(0.236) 


(0.223) 


(0.313) 


(0.130) 


(0.343) 


(0.110) 




500 


0.967 


0.434 


0.959 


0.371 


0.939 


0.043 


0.933 


0.006 






(0.142) 


(0.166) 


(0.168) 


(0.147) 


(0.235) 


(0.085) 


(0.249) 


(0.031) 


0.7 


100 


0.674 


0.432 


0.674 


0.414 


0.606 


0.244 


0.430 


0.092 






(0.291) 


(0.289) 


(0.291) 


(0.287) 


(0.365) 


(0.241) 


(0.432) 


(0.144) 




200 


0.931 


0.292 


0.926 


0.248 


0.888 


0.148 


0.874 


0.112 






(0.196) 


(0.246) 


(0.218) 


(0.207) 


(0.295) 


(0.146) 


(0.314) 


(0.125) 




500 


0.970 


0.427 


0.966 


0.365 


0.937 


0.032 


0.934 


0.010 






(0.134) 


(0.173) 


(0.150) 


(0.150) 


(0.234) 


(0.072) 


(0.240) 


(0.038) 
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Table 5.2: The PDR and FDR of the forward selection procedure with EBIC under 
simulation setting 2 (the PDR and FDR are averaged over 200 replicates, the numbers 
in parenthesis are standard errors) 





71 


72 


73 


74 


p 


n 


PDR 


FDR 


PDR 


FDR 


PDR 


FDR 


PDR 


FDR 


0.3 


100 


0.662 


0.424 


0.660 


0.409 


0.594 


0.233 


0.492 


0.132 






(0.272) 


(0.287) 


(0.276) 


(0.286) 


(0.350) 


(0.237) 


(0.392) 


(0.195) 




200 


0.931 


0.256 


0.926 


0.231 


0.891 


0.111 


0.881 


0.068 






(0.199) 


(0.245) 


(0.212) 


(0.222) 


(0.281) 


(0.137) 


(0.295) 


(0.101) 




500 


0.973 


0.401 


0.967 


0.339 


0.946 


0.041 


0.941 


0.018 






(0.127) 


(0.173) 


(0.149) 


(0.134) 


(0.209) 


(0.089) 


(0.217) 


(0.055) 


0.5 


100 


0.571 


0.489 


0.570 


0.478 


0.521 


0.304 


0.442 


0.189 






(0.259) 


(0.274) 


(0.261) 


(0.276) 


(0.303) 


(0.265) 


(0.337) 


(0.230) 




200 


0.918 


0.272 


0.910 


0.239 


0.888 


0.121 


0.869 


0.081 






(0.204) 


(0.256) 


(0.230) 


(0.231) 


(0.267) 


(0.148) 


(0.293) 


(0.122) 




500 


0.970 


0.402 


0.964 


0.351 


0.946 


0.056 


0.942 


0.021 






(0.129) 


(0.183) 


(0.148) 


(0.153) 


(0.199) 


(0.115) 


(0.212) 


(0.062) 



Table 5.3: The PDR and FDR of the forward selection procedure with EBIC under 
simulation setting 3 (the PDR and FDR are averaged over 200 replicates, the numbers 
in parenthesis are standard errors) 





71 


72 


73 


74 


n 


PDR FDR 


PDR FDR 


PDR FDR 


PDR FDR 


100 


0.586 0.506 
(0.258) (0.252) 


0.586 0.484 
(0.258) (0.253) 


0.524 0.332 
(0.316) (0.252) 


0.387 0.198 
(0.366) (0.239) 


200 


0.796 0.414 

(0.261) (0.282) 


0.791 0.386 

(0.274) (0.273) 


0.767 0.285 

(0.311) (0.247) 


0.746 0.221 

(0.334) (0.228) 


500 


0.946 0.479 
(0.167) (0.165) 


0.936 0.416 
(0.197) (0.150) 


0.912 0.195 
(0.248) (0.185) 


0.896 0.171 
(0.269) (0.176) 
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Table 5.4: Analysis of Leukemia Data: the top 50 genes seleeted by the forward 
selection procedure with the four links: logit (lo), probit (pr), cauchit (ca) and cloglog 
(cl) 



Rank and Gene ID 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


lo 


1834*^* 


4438 


4951 


6539* 


155 


2181 


1882*^* 


6472 


65 


1953 


pr 




4438 


4951 


155 


5585 


5466 


706 


7119* 


3119 


4480 


ca 


1882*'^* 


4951 


6281* 


4499 


4443 


6539* 


5107 


1834*^* 


4480 


6271 


cl 


2834*A* 


6855*^^* 


4377 


5122 


2830 


4407 


4780 


6309 


4973* 


715 




11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


lo 


3692 


706 


1787 


5191* 


1239 


3119 


2784 


1078 


3631 


6308 


pr 


6201^^ 


490 


6895 


1882*^* 


1809 


2855 


3123 


4211* 


2020** 


3631 


ca 


6378 


3631 


2111* 


6201^ 


6373* 


1800 


4780 


321 


4107^ 


1779^ 


cl 


5376 


930 


1800 


1882*-^* 


5794 


4399 


4389* 


922 


1962 


4267 




21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


lo 


6373* 


1909* 


4153 


1685^ 


6855*^* 


7073 


5539 


2830 


4819 


6347 


pr 


5823 


1953 


1745^* 


65 


997 


1928* 


3307 


1787 


538 


5539 


ca 


6277 


1544 


5254* 


1928* 


1745^* 


3163 


7073 


310 


4389* 


5146 


cl 


1926 


4229 


5254* 


770 


2141 


6923 


7073 


2828 


4847* 


698 




31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


iO 


iuo± 


1095 


5328 


4279 


4373 


5737 


4366 


5280 


ooU / 


98/1 


pr 


4107 


2385 


1087 


1909* 


5376 


5552 


6005 


1604 






ca 


1927 


885 


3137 


2258 


4334 


6657 


2733 


5336 


OVIZ 


DID ( 


cl 


1779 


1928* 


4049 


876 


6857 


6347 


6376* 


2361 


Add A 


I oo 




41 


42 


43 


44 


45 


46 


47 


48 


A n 


50 


lo 


6676 


4291 


1945 


4079 


3722 


668 


782 


4196* 


zo 




pr 


6702 


6309 


2348* 


4282 


4925 


6167 


2323 


1779 


5122 


3847* 


ca 


4229 


4328* 


715 


4149 


5191* 


6283 


200 


6702 


5794 


4190 


cl 


3631 


6308 


4499 


4480 


5971 


6510 


5300 


3475 


3932 


6801 




Table 5.5 


Analysis of Leukemia Data: the final selected 


genes by EBIC 






Link Function 


Selected Genes 


Maximum Likelihood 














logit 


1834, 


4438 


-2.296e-08 














probit 


1834, 


4438 


-3.022e-08 














cauchit 


1882, 


4951 


-2.122e-06 














cloglog 


1834, 


6855 


-6.908e-08 
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